AITopics | data documentation

Collaborating Authors

data documentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Datasheets for AI and medical datasets (DAIMS): a data validation and documentation framework before machine learning analysis in medical research

Marandi, Ramtin Zargari, Frahm, Anne Svane, Milojevic, Maja

arXiv.org Artificial IntelligenceJan-23-2025

Despite progresses in data engineering, there are areas with limited consistencies across data validation and documentation procedures causing confusions and technical problems in research involving machine learning. There have been progresses by introducing frameworks like "Datasheets for Datasets", however there are areas for improvements to prepare datasets, ready for ML pipelines. Here, we extend the framework to "Datasheets for AI and medical datasets - DAIMS." Our publicly available solution, DAIMS, provides a checklist including data standardization requirements, a software tool to assist the process of the data preparation, an extended form for data documentation and pose research questions, a table as data dictionary, and a flowchart to suggest ML analyses to address the research questions. The checklist consists of 24 common data standardization requirements, where the tool checks and validate a subset of them. In addition, we provided a flowchart mapping research questions to suggested ML methods. DAIMS can serve as a reference for standardizing datasets and a roadmap for researchers aiming to apply effective ML techniques in their medical research endeavors. DAIMS is available on GitHub and as an online app to automate key aspects of dataset evaluation, facilitating efficient preparation of datasets for ML studies.

artificial intelligence, data quality, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.14094

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)
Information Technology > Security & Privacy (0.94)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

A Standardized Machine-readable Dataset Documentation Format for Responsible AI

Jain, Nitisha, Akhtar, Mubashara, Giner-Miguelez, Joan, Shinde, Rajat, Vanschoren, Joaquin, Vogler, Steffen, Goswami, Sujata, Rao, Yuhan, Santos, Tim, Oala, Luis, Karamousadakis, Michalis, Maskey, Manil, Marcenac, Pierre, Conforti, Costanza, Kuchnik, Michael, Aroyo, Lora, Benjelloun, Omar, Simperl, Elena

arXiv.org Artificial IntelligenceJun-4-2024

Data is critical to advancing AI technologies, yet its quality and documentation remain significant challenges, leading to adverse downstream effects (e.g., potential biases) in AI applications. This paper addresses these issues by introducing Croissant-RAI, a machine-readable metadata format designed to enhance the discoverability, interoperability, and trustworthiness of AI datasets. Croissant-RAI extends the Croissant metadata format and builds upon existing responsible AI (RAI) documentation frameworks, offering a standardized set of attributes and practices to facilitate community-wide adoption. Leveraging established web-publishing practices, such as Schema.org, Croissant-RAI enables dataset users to easily find and utilize RAI metadata regardless of the platform on which the datasets are published. Furthermore, it is seamlessly integrated into major data search engines, repositories, and machine learning frameworks, streamlining the reading and writing of responsible AI metadata within practitioners' existing workflows. Croissant-RAI was developed through a community-led effort. It has been designed to be adaptable to evolving documentation requirements and is supported by a Python library and a visual editor .

dataset, information, use case, (13 more...)

arXiv.org Artificial Intelligence

2407.16883

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > North Carolina (0.04)

Genre: Workflow (0.88)

Industry:

Law (1.00)
Health & Medicine (1.00)
Information Technology > Security & Privacy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata

Heger, Amy K., Marquis, Liz B., Vorvoreanu, Mihaela, Wallach, Hanna, Vaughan, Jennifer Wortman

arXiv.org Artificial IntelligenceAug-24-2022

Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparency around the processes by which they are created, researchers and practitioners have begun to advocate for increased data documentation and have proposed several data documentation frameworks. However, there is little research on whether these data documentation frameworks meet the needs of ML practitioners, who both create and consume datasets. To address this gap, we set out to understand ML practitioners' data documentation perceptions, needs, challenges, and desiderata, with the goal of deriving design requirements that can inform future data documentation frameworks. We conducted a series of semi-structured interviews with 14 ML practitioners at a single large, international technology company. We had them answer a list of questions taken from datasheets for datasets (Gebru, 2021). Our findings show that current approaches to data documentation are largely ad hoc and myopic in nature. Participants expressed needs for data documentation frameworks to be adaptable to their contexts, integrated into their existing tools and workflows, and automated wherever possible. Despite the fact that data documentation frameworks are often motivated from the perspective of responsible AI, participants did not make the connection between the questions that they were asked to answer and their responsible AI implications. In addition, participants often had difficulties prioritizing the needs of dataset consumers and providing information that someone unfamiliar with their datasets might need to know. Based on these findings, we derive seven design requirements for future data documentation frameworks.

data documentation, dataset, participant, (14 more...)

arXiv.org Artificial Intelligence

2206.02923

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Law (1.00)
Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

The Importance of a Proper Data Culture

#artificialintelligenceSep-9-2020, 13:35:36 GMT

Beginning with AI means you need a proper data culture to start with. AI is not magic, despite what many may still think. Before even thinking of AI, the data needs to be in order. You need documentation, policies, and most importantly a proper data culture. This is the first in a series of interviews with practitioners in the field about generating business value with AI.

artificial intelligence, assaad, infrastructure, (13 more...)

#artificialintelligence

Country: Europe > Switzerland (0.05)

Industry: Energy > Power Industry (0.36)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback